Indonesian Journal of Electrical Engineering and Computer Science 
Vol. 12, No. 1, October 2018, pp. 127~136 
ISSN: 2502-4752, DOI: 10.1159 1/ijeecs.v12.i1.pp127-136 0 127 


Arrhythmia Classification Based on Combined Chaotic and 


Statistical Feature Extraction 


G. Jayagopi, S. Pushpa 


St.Peter’s Institute of Higher Education and Research, Chennai, India 








Article Info 


ABSTRACT 





Article history: 


Received Feb 6, 2018 
Revised May 8, 2018 
Accepted Jun 14, 2018 





Obvious information content in Electro cardio graph has become mandatory 
to reveal the abnormalities in the heart functions. Arrhythmia is commonly 
seen heart disorder and results in fatal end, if not identified and treated 
properly within time limits. The straight forward scene in such diagnosis is to 
detect the salient features from the Electro cardio graph data using signal 
processing methods followed by proper classification methods. 16 classes of 
Arrhythmia had been classified in this work by adopting the traditional 





Keywords: method of abnormality detection while introducing a novelty in the type of 
Katioge features to be extracted. Lyapunov Exponents, Kolmogorov Sinai Entropy 
Density, Kolmogorov Sinai Entropy Universality and R-R interval features 
Skewness based on Kurtosis and Skewness had been used to classify the heart beats 
SVM from the benchmark MIT-Arrhythmia database. Since alternative features 
Arrhythmia had been utilized, common Support Vector Machines based classification 
Chaos could produce an accuracy of 98.95% in the proposed work with just 13 
features. 
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1. INTRODUCTION 

Though Arrhythmias never cause abrupt death, it is a sign of heart abnormality which would play 
lethal roles in due course. Besides the automation in the detection of abnormalities in heart functionality, 
digital signal processing methods help doctors for accurate diagnosis and following therapy. Since 
abnormality is caught just based on the Electro cardio graph (ECG) morphology and related time instant of 
occurrences of various peaks and peculiar shapes, human based diagnosis for a large data base would be 
difficult and inaccurate. Therefore, employing signal processing techniques would yield enormous benefits in 
the domain of biomedical instrumentation. Automatic ECG analysis has attracted researchers in the recent 
decades and there have been sufficient positive impacts due to those methodologies. But, it is unfortunate that 
the results are highly related to noise in ECG data, accuracy and time consumption. Moreover, classification 
accuracy is inversely proportional to the number of classes handled in the research. 

Morphological and temporal characteristics of ECG signal considerably vary among different test 
subjects. Naturally this leads to challenges in automatic analysis and related classification services through 
digital signal processing techniques. All the techniques involve a type of feature extraction method followed 
by the classification method in order to obtain the complete characteristics of ECG signal. Time domain 
analysis [1-4], frequency domain analysis [5], time frequency analysis [6-8], statistics based analysis [9-11] 
and hybrid feature based methods [12-13] are common in the study of ECG signals. Extracted features are 
classified using neural networks, neurofuzzy systems [14], Linear Discriminant Analysis (LDA) [15] and 
Support Vector Machines (SVM) [16-17] to detect the irregularities in heart function. In ECG related 
classification, superiority in work is based on number of classes involved in diagnosis. Many works 
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attempted in earlier decades focused only on finding either the heart beat is normal or abnormal. Later 
researches could slowly classify more number of abnormalities. Premature ventricular contraction (PVC) and 
its difference from types Normal (N), Premature Atrial Contractions (PAC), Right Bundle Branch Block 
(RBBB) of ECG beats was done through chaotic feature extraction. Experiments conducted with data from 
MIT-BIH database could produce an accuracy of 99.1%. However, this work could handle only 4 classes 
including normal case. . Diagnosis of PVC is of great importance in goal-directed treatment and preoperative 
prognosis [18]. 

This being the commonly followed procedure, fusion of feature extraction and classification using 
one dimensional (1D) Convolutional Neural Network (CNN) had been attempted [19]. Using a dedicated 
CNN, long ECG data could be classified for a particular patient. Though this work denies the necessity of 
hand-crafted manual features, it is a patient specific system which would generate an early alert using a light- 
weight wearable device. Also, classification performance was superior only for Ventricular Ectopic Beats 
(VEB) and Supraventricular Ectopic Beats (SVEB) when tested on MIT-BIH arrhythmia benchmark 
database. 

It is seen that, ECG signal analysis not only identifies the heart disorders but also breathing disorder 
namely, sleep apnea [20]. This work utilized variational mode decomposition instead of using expensive and 
time consuming gold standard polysomnogram. The accuracy obtained for both online and offline processes 
were 97.5% and 95% respectively by utilizing energy and R-R interval feature from the variational mode 
functions. Healthy and apnea subjects were classified using SVM. Research work attempt used R-R interval 
and frequency domain features and classified using Radial basis function (RBF) and Spline Activated Feed 
Forward Neural Network (SAFFNN). However, these features were not sufficient and it could yield only 
90.85 % of accuracy in classifying the arrhythmia in MIT-BIH database [21]. 

A similar work seen to be done based on frequency domain feature. In their proposed work, 
Artificial Bee Colony (ABC) optimized Least Squares Support Vector Machines (LSSVM) classifier using 
RBF was proposed [22]. Intrinsic Mode Functions (IMF) were used as band width feature. However the 
accuracy obtained in classification could not exceed 94.61 %. The classification accuracy is mandatory not 
only in arrhythmia detection but even in biometric recognition based on ECG. The works done also suffers in 
accuracy, though they used a multitask learning approach for feature extraction and classifier [23]. It is 
obvious that the end result is dependent on intelligible feature and sufficient distance metrics among the 
feature magnitudes. Most of the works could get classification accuracy above 90%, but the total numbers of 
classes in such claims are less than 10. The complexity in the classification increases proportional to the 
number of classes. This scenario is witnessed in [24]. Myocardial infarction (MI), Heart Muscle Disease 
(HMD) and Bundle BranchBlock (BBB) were the three cases classified using Complex Wavelet Sub Band 
bi-spectrum (CWSB) features from 12-lead ECG. Experimental results show that the CWSB features of 12- 
lead ECG and the SVM classifier yielded the individual accuracy values for MI, HMD and BBB classes 
98.37, 97.39 and 96.40%, respectively, using SVM classifier and RBF kernel function. 

Yet another research [25] had obtained 95% accuracy in classifying cardiac arrhythmia such as 
myocardial infarction, cardiomyopathy, and myocarditis. It is seen that only 3 classes case was implemented 
using General Regression Neural Network (GRNN). Long term accumulated patient ECG data could produce 
88% accuracy with efficiency improvement in the order of 450 times. However, this work suffers from very 
less number of classes. Classifying normal beat, supraventricular ectopic beat, bundle branch ectopic beat, 
ventricular ectopic beat, fusion beat and unknown beat had been attempted in [26]. QRS complexes of the 
ECG waveform had been converted into Fourier spectrum and power variations were observed within 0-20 
Hz spectrum. Grey Relational Analysis (GRA) was performed to classify the aforementioned abnormalities 
based on MIT-BIH arrhythmia bench mark database. However, this noninvasive method is limited only to 6 
classes including the normal beat. This major drawback is due the fact that feature used is based only on the 
power spectrum of the frequency domain signal. 

The works claimed in [27] could classify 7 arrhythmia (PVC, Atrial Fibrillation (AF), Complete 
Heart Block (CHB), Left Bundle Branch Block (LBBB), Normal Sinus Rhythm (NSR), Ventricular 
Fibrillation (VF) and Ventricular Tachycardia (VT)) using 14 features from time domain, frequency domain, 
nonlinear and chaotic features were extracted to train Multi-Layer Perceptron (MLP) neural networks after 
computing Heart Rate Variability (HRV). Generalized Discriminate Analysis (GDA) has been used as a 
dimension reduction method prior to train the neural network. Though the training set was filtered by 
deleting the confusing data, overall performance of 95% to 100% accuracy in classification is limited only to 
7 classes of arrhythmia on MIT-BIH database. Using well-defined nonlinear dynamic Lyapunov exponents 
for the analysis of ECG signal had been introduced in [28]. Four types of ECG beats (normal beat, congestive 
heart failure beat, ventricular tachyarrhythmia beat, atrial fibrillation beat) obtained from the PhysioBank 
database were classified using the Recurrent Neural Networks (RNN) trained with the Levenberg-Marquardt 
algorithm based on the Maximum, Minimum, Mean and Standard deviation of the Lyapunov Exponents of 
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each ECG beat. The classification accuracy obtained was 94.72%. However, the major drawback is limited 
number of classes (four) and this work has utilized only Lyapunov exponent and discarded other chaotic 
metrics which would have been used as an efficient feature. 

Most suitable state of art comparison in arrhythmia classification would be the literature [29], as it 
involves all the 16 classes as mentioned in MIT-BIH database with a classification accuracy of 98.82%. 
Those researchers could get this maximum accuracy through a new approach, i.e., discrete orthogonal 
stockwell transform using discrete cosine transform for efficient representation of the ECG signal in time- 
frequency space. Further, a dimension reduction had been done using principal component analysis, 
representing the morphological characteristics of the ECG signal. Besides this, dynamic R-R interval feature 
was also computed and concatenated to constitute the final feature set. As a whole, 20 features had been 
extracted to classify the MIT-BIH arrhythmia database using SVM classifier optimized through Particle 
Swarm Optimization (PSO). The experimental results yielded an improved overall accuracy, sensitivity (Sp), 
and positive predictivity (Pp) of 98.82% in comparison with the conventional approaches available earlier to 
this research literature. 

Heart beat detection based on Phytagoras theorem had been done in [30]. However, the algorithm 
fails to classify various abnormalities in ECG. Works done is [31] and [32] are useful to the researchers to 
perform preprocessing on ECG signals. 

While carefully observing the earlier literatures in arrhythmia classification, it was observed that the 
foremost problem is to obain better classification accuracy when more number of classes of abnormalities in 
ECG is involved. It is the task of the researcher to choose a suitable feature extraction method which would 
represent the complete morphological characteristics of various classes of ECG signal. As a novel approach, 
along with conventional R-R interval features, statistical features and chaotic metrics have been concatenated 
in feature sets based which the classification is done using Support Vector Machines (SVM). 


2. RESEARCH METHOD 
2.1. Mit-bih arrhythmia database 

For all the experiments conducted, ECG recordings of 47 different subjects comprising 48 records 
studied by BIH Arrhythmia laboratory have been used [33]. 


Table 1. Datasets Summary 








ECG signal form Annotation Total Training Test 
Normal (NOR) N 75017 11253 63764 
Left Bundle branch block (LBBB) 1, 8072 2825 5247 
Right Bundle branch block (LBBB) R 7255 2539 4716 
Atrial premature contraction (APC) A 2546 891 1655 
Preventricular contraction (PVC) Vv 7129 2495 4634 
Paced beat (PACE) P 7024 2458 4566 
Aberrated atrial premature beat (AP) a 150 75 75 
Ventricular flutter (VF) ! 472 236 236 
Fusion of ventricular and normal beat (VFN) F 802 401 401 
Blocked atrial premature beat (BAP) xX 193 97 96 
Nodal (Junctional Escape Beat) J 229 115 114 
Fusion of paced and normal beat (FPN) f 982 491 491 
Ventricular escape beat (VE) E 106 53 53 
Nodal (Junctional) premature Beat (NP) J 83 42 41 
Atrial Escape beat (AE) e 16 8 8 
Unclassifiable beat (UN) Q 33 17 16 
Total 110109 23996 86113 





The MIT-BIH database contains 110109 beat labels while the data are passed through a band pass 
filter between 0.1 Hz -100 Hz. The digitized outputs with sampling frequency of 360 samples/s and each 
sample over 10 mV range is represented by digital data with 11 bit resolution. Modified limb lead from the 
database has been used particularly heart beat segments obtained using a window across each R-peak. 
Ground truth is obtained from the class annotations provided by the bench mark database. The summary of 
the data sets and the details of 16 ECG signal classes are provided in Table 1. In order to maintain the 
generality, as selected in the state of art work, ECG signals from each of the 16 classes are chosen randomly 
to constitute the training and testing data sets by dividing the whole data set into 16 bunches, where each 
group represents their category. Particularly, 15% from normal category, 35% from ‘L’, ‘A’, ‘R’, ‘V’, and 
‘P’ category, and 40% from each of the ten classes of ECG signals are selected randomly for the training data 
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set, Le., a total of 21.79% (less proportion for training) events of the whole data set are selected for the 
training data set while the rest of ECG signals are utilized for testing the proposed method. 


2.2. Lyapunov Exponents 

Lyapunov Exponents (LE) is very useful in analyzing the dynamical systems. The sensitivity of 
divergence or convergence of trajectories in phase space with respect to the initial conditions is measured 
through Lyapunov Exponents. A system with at least one positive exponent is considered to be in chaotic 
region. LE is a measure of how diverse the lattices during each time iteration and it is given by Equation (1). 


ACH) = Timy sco (1) 


Where n refers iterations and A (i) is LE. A (i) are calculated from the Eigen values o(i) of Ry, as given in 
[34]. Rn is calculated using Equation (2) from the initial values of the lattices from the construction of 
Jacobian matrix Jn as done in [35] in each iteration. Then we define. 


Rn = =i Ji (2) 


After calculating the LE values, those lattices have positive values are understood to be in chaotic 
region. In this work, lattice values are nothing but the time series values obtained from test database. The 
sum of Lyapunov exponents reveals the damping nature of a system and any changes in damping could be 
monitored with LE. Calculation of LE is done in many methods; the one given in Equation (3) is related to 
discrete time system. Few other approaches to calculate LE for a continuous time series are reported below. 
Computing LE and Instantaneous Lyapunov exponents (ILE) utilized phase space and tangent space 
approach in [36]. In an algorithm developed in [37] Short term averaged Lyapunov Exponents (SLE) were 
introduced. This is needed when the experimental data (time series) gives inaccurate ILE from a time series 
due to computational errors. A similar concept to the SLE, Local Lyapunov Exponents (LLE) was proposed 
in [38]. It is convenient to model a dynamical continuous time system by ordinary differential equations 
which is of the form given in Equation (3) and Equation (4). 


dx 

ao £(X1, Xz, X3 wwe ee Xp) (3) 

— [1 dx. deny? 

age (4) 
Where, X=[X),X2, oxy 


The above equation gives a set of trajectories in phase space. The ith Lyapunov Exponent is 
calculated as given in Equation (5). 


F 1, Pi 
Ay = limpse PaLewcy (5) 


Where, the Eigen values are ordered from largest to smallest. Since the integration time is of infinite, it is 
practically not possible for infinite time series. Hence, LE calculation based on finite number of iterations is 
given below in Equation (6). 


a, = tn B® (6) 


t Pi(0) 


LE gives a better idea on how the nearby orbits diverge due to initial conditions. The method of 
calculating Lyapunov exponents have been already dealt in almost similar methods as given in [39-42]. 


2.3. Kolmogorov Sinai Entropy Density 

The spatiotemporal chaotic system of the proposed system can be considered as L dimensions 
dynamics, the Kolmogorov-Sinai entropy (KSE) of the L dimensions dynamics is the sum of positive LEs. 
Without loss of generality, the Kolmogorov-Sinai entropy density is employed here to eliminate the effect of 
number of lattices, which is presented in Equation (7) as follows 


Lat 
h= a0 (7) 
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Where, h is the KSE density and the numerator is the sum of positive values of LE [43]. 


2.4. Kolmogorov-Sinai Entropy Universality 

The Kolmogorov-Sinai entropy density indicates whether or not the spatiotemporal chaotic system 
is in chaos. However, KSE density cannot present chaotic majority of L lattices since KSE density is positive. 
Here, we employed KSE generality (or universality) hu as given in Equation (8). 

L’ 

hu = L (8) 
Where, hu is the KSE generality and L’ is the number of positive Lyapunov exponents in spatiotemporal 
chaotic system of the proposed system [43]. The KSE generality is the percentage of lattices in chaos, which 
evaluates the space complexity in L dimensions of dynamics [44]. 


2.5. Standard Deviation 

Standard deviation is a measure of the dispersion of the data from its mean [45]. The lower the 
standard deviation, the data points tend to be more close to the mean and vice versa. The formula for the 
standard deviation of the given matrix is; 








1 


a = = = 2 
std = aan (™  N) Dito Lyso UC y)? — (Dio! Dyso ux y)) ) (9) 


Where M=1, number of rows and N is the number of columns. 


2.6. Kurtosis 

Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That is, 
data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly and have heavy 
tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak [45]. The 
formula for the kurtosis of the selected window length is as follows. 








2, (MxN)x(MXN+1) M-1<N-1 ey ____3(MxN-1)? 
kurtosis = Is 1)x(MxN—2)x(MXN-3) x Dx=0 Ly=0 std (MxN—2)x(MxN-3) (10) 





Where M=1, number of rows and N is the number of columns. 


2.7. Skewness 

Skewness is a measure of the asymmetry of the data. Qualitatively, a negative skewness indicates 
that the tail on the left side of the Gray Level Histogram (GLH) is longer than the right side, and the bulk of 
the values (including the median) lie to the right of the mean. A positive skewness indicates that the tail on 
the right side is longer than the left side and the bulk of the values lie to the left of the mean [45]. The 
formula for the skewness of the given matrix is 


7 MxN M=t3Ne2 (naue)" 
= (MxN-1)(MxN-2) Lx=o Ly=0% |g a 





Where M=1, number of rows and N is the number of columns. 


2.8. R-R Interval Features 

By nature, the process of pumping blood is not synchronized to any standard clock. Based on the bio 
clock of any individual, there could be variations in the rhythm of the heart. This variation is usually caught 
from the R-R interval between two heart beats and this feature is a good representative of the dynamic 
characteristic of the ECG signals. Four R-R features are computed that correspond to the pattern of ECG 
signal, namely, pre R-R, post R-R, local R-R, and average R-R interval. In this paper, the interval between a 
previous R-peak and the current R-peak is computed to determine the pre R-R feature, while the interval 
between a given R-peak and the followed R-peak is computed to determine the post R-R feature. The 
combination of the pre and post R-R interval feature of the ECG signal corresponds to an instantaneous 
rhythm characteristic. The average R-R interval feature is derived by averaging the R-R intervals of the past 
3-min episode of a particular event. Likewise, the local-R-R feature is derived by averaging all the R-R 
intervals of the past 8-s episode of a particular event. The local and average features represent the average 
characteristics of a series of ECG signals. Further, kurtosis, skewness and standard deviation are calculated 
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from the obtained R-R interval. Finally, all these dynamic features are concatenated. As a result, 13 hybrid 
features (i.e., four R-R interval features, six chaotic features (Maximum LE, Minimum LE, Average LE, 
Standard deviation of LE, KSE density and KSE generality) and 3 statistical features) are determined to 
represent each denoised input ECG and are further processed for classification. Prior to the extraction of 
aforementioned features, ECG data is filtered using a low pass filter with cut off frequency 400 Hz to remove 
unwanted noise signals. 

All the aforementioned features are extracted from the MIT-BIH benchmark database by means of 
MATLAB tool boxes effectively. The proper utilization of MATLAB functions (both built-in and user 
defined), toolboxes such as statistical tool box, Digital Signal processing tool box, mathematical tool box, 
etc., can lead to work with ECG signals for processing and analysis both in real time and by simulation with 
great accuracy and convenience [46]. Evaluation of Statistical features has been inspired by [45] and [47]. 
Kurtosis, skewness and standard deviation is calculated for the window of R-R interval. Since an automatic 
knowledge discovery is essential in this proposed arrhythmia classification, chaotic map algorithm is 
proposed to recognize the patterns based on chaotic metrics shown in Figure 1. This chaotic map algorithm 
succeeds in efficient classification of normal and abnormal patterns with better sensitivity and specificity 
[48]. However, parameters such as KSE density and KSE universality are the additionally extracted feature in 
the proposed method in order to improve the classification performance. 





Figure 1. Block diagram of the proposed system 


3. RESULTS AND ANALYSIS 
3.1. Performance Evaluation Parameters 

The performance analysis for each class of event is estimated by computing the parameters such as 
true positive (TP), true negative (TN), false positive (FP) and false negative (FN) parameters, where (TP) and 
(TN) represent the correct classification of the normal and abnormal ECG signals. On the basis of these TP, 
TN, FP, FN parameters, the performance metrics for each class of signal are calculated namely, 
sensitivity,specificity and positive predictivity where sensitivity is the rate of correctly classified events 
among the total number of events, whereas positive predictivity refers to the rate of correctly classified 
events in all detected events. Using these definitions, sensitivity and specificity can be defined as 

Tp 


Se = yp X 100,P, = 


Tp 
Tp+Fp 








x 100 (12) 


The overall accuracy and error rate can be defined as given in Equation 13 and 14 respectively. 


T : Total correctly classified data 
Classification accuracy (%) = y 





x 100 (13) 


Total number ofdata 


ee (%) = Total misclassified data x 100 (14) 


Total number ofdata 


All these above-mentioned parameters are computed and highlighted based on the simulation carried 
out using MIT-BIH database. 
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3.2. Results and Analysis 

The proposed feature extraction and classification methods discussed in the previous section are 
implemented using mathematical and statistical tool boxes available in MATLAB software (version 9.0, 
R2016a) package installed on Windows 8 pro platform (i.e., AMD E1-2100 processor, 1 GHz, 4 GB RAM) 
for the analysis of ECG signals. The experiments for the proposed methodology are carried out and validated 
using the benchmark MIT-BIH arrhythmia database. The SVM classifier is trained on the training data set 
mentioned in Table | and its performance is analyzed for each tested ECG signal. The prediction 
performance of the tested ECG signals into their subsequent categories using the proposed methodology is 
presented here in the form of confusion matrix shown in Table 2. 

Normally in all classification works, if the training signals are increased, the increase in the number 
of training signals will lead to increased classification accuracy. In [37], five classes of ECG signals are 
classified achieving an overall average accuracy of 93.48%. In this paper, third higher order statics features 
are classified using least square SVMs with limited number of testing signals. However, the experiments seen 
in [37] are performed only selected records and beats which remains unjustified by the respective researchers. 
The advantage of the proposed feature set is that it consists of the combination of morphological features and 
dynamic features, i.e., based on R-R interval information and chaotic features representing the different 
characteristics of the ECG signal. The combination of both these combined features yielded improved 
classification accuracy. Nonetheless, the computational complexity of the proposed methodology needs to be 
evaluated for real-time applications. In addition, the proposed technique is validated on all the ECG data (i.e., 
without excluding any segment) of benchmark MIT-BIH arrhythmia database with 21.8% training data 
leading to less consumption of training time and memory on the hardware (is valid, because, running time of 
algorithm is a real threat during training tenure). 


3.3. Confusion Matrix for Proposed Model 

In order to explain the confusion matrix better, an example is presented by taking normal class of 
signals and related count values for an example. The first row corresponds to the normal category and implies 
that 63187 signals are correctly detected as normal signals by the proposed methodology among 63764 actual 
numbers of normal signals and the rest of the normal signals are misclassified in the other categories. In 
column 1, 63328 normal signals are detected in the normal category that includes signals from the other 
categories, i.e., 63187 normal signals are correctly classified and the signals from other classes are 
misclassified into the normal category representing a total of 63328 signals. In the same procedure, the 
classification results for the other 15 categories of ECG signals are also calculated and presented in Table 3. 
Moreover, out of 86113 test signals in total, 85209 signals are correctly classified and 904 signals are 
misclassified for the 16 classes of ECG signals. The accuracy and error rate of the proposed methodology is 
computed using (13) and (14), which is 98.95% and 1.05% respectively. 


Table 2. Confusion Matrix for SVM Model 
Correctly classified instances: 85209, Misclassified instances: 904, Error: 1.05%, Accuracy: 98.95% 








Ground Truth 
Class N L R A Vv P a ! F x j f E J e Q_ Total 
N 63187 32 0 286 139 0 27 0 71 0 15 7 0 0 O 0 - 63764 
L 10 5222 0 0 15 0 0 0 0 0 0 0 0 0 oO 0 5247 
R 70 0 4640 0 6 0 0 0 0 0 0 0 0 0 oO 0 4716 
A 8 7 0 1619 4 0 0 13 4 0 0 0 0 0 oO 0 1655 
Vv 21 5 0 0 4584 0 0 5 19 0 0 0 0 0 oO O 4634 
P 0 0 27 0 0 4509 0 0 0 0 0 30 0 0 oO 0 4566 
A 0 0 0 0 4 0 71 0 0 0 0 0 0 0 oO 0 75 
! 0 0 0 0 25 0 0 211 0 0 0 0 0 0 oO 0 236 
f 8 0 0 0 5 0 0 0 388 0 0 0 0 0 oO 0 401 
x 9 0 0 2 0 0 0 0 0 85 0 0 0 0 oO 0 96 
J 0 0 0 1 0 0 0 0 0 O 111 0 0 2 0 0 114 
F 1 0 0 0 0 6 0 0 0 0 0 484 0 0 oO 0 491 
E 4 0 0 0 0 0 0 0 0 0 0 0 49 0 0 0 53 
J 2 0 0 0 0 0 0 0 0 0 0 0 0 39 O 0O 41 
E 5 0 0 0 0 0 0 0 0 0 0 0 0 0 3 #0 8 
Q 3 0 0 0 2 0 0 0 0 0 0 4 0 0 oO 7 16 
Total 63328 5266 4667 1908 4784 4515 98 229 482 85 126 525 49 41 3 #7 86113 
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Table 3. Performance of Each Class of ECG 








ECG class Trained beats Test beats Fy Tp Fp S. (%) Pp (%) Accuracy (%) 
N 11253 63764 577 ~—- 63187 141 99.09 99.77 99.09 
L 2825 5247 25 5222 44 99.52 99.16 99.52 
R 2539 4716 76 4640 27 98.38 99.42 98.38 
A 891 1655 36 1619 289 97.82 84.85 97.82 
Vv 2495 4634 50 4584 200 98.92 95.82 98.92 
P 2458 4566 57 4509 6 98.75 99.86 98.75 
a 715 75 4 71 27 94.66 72.45 94.66 

! 236 236 25 211 18 89.40 92.14 89.40 
F 401 401 13 388 94 96.76 80.49 96.76 
x 97 96 11 85 0 88.54 100 88.54 
j 115 114 3 111 15 97.37 88.10 97.37 
f 491 491 7 484 41 98.57 92.19 98.57 
E 53 53 4 49 0 92.45 100 92.45 
J 42 41 2 39 2 95.12 95.12 95.12 
e 8 8 5 3 0 37.50 100 37.50 
Q 17 16 9 7 0 43.75 100 43.75 
Total 23996 86113 904 85209 904 98.95 98.95 98.95 





The performance assessment of the proposed methodology is carried out by computing the 
parameters such as TP, FP, and FN based in Table 2, using (12) to evaluate the sensitivity and positive 
predictivity analysis for each class of ECG signal, which is presented in Table 3. Both the average sensitivity 
and positive predictivity performance evaluation parameters reported for the proposed methodology is 
98.95% respectively. 

The proposed method and the conventional works reported in the literature [8-10], [29], [49-51] on 
the basis of the number of classes of ECG signals classified and classification accuracy are highlighted in 
Table 4. Though noise immunity was better, Hilbert transform used in [52] could not yield exact R-R interval 
due to variation in R-R interval between the beats. Unlike the detection of only one class of Arrhythmia in 
[53], proposed method classifies 16 classes of ECG signal with significantly better classification accuracy 
compared to other reported works in the literature. All the literature taken for comparison suffers either with 
insufficient accuracy or less number of classes. In [50], though an acceptable accuracy is obtained for all 16 
arrhythmia classes, the experiments are performed using 66% of training set and only 33% of testing sets. 
However, training the system with lesser number of features is considered to be more efficient while gaining 
maximum classification accuracy. In the proposed work only 21.8% training data are consumed and 
classification has been done with only 13 features. 


Table 4. Comparison with Conventional Research works 








Literature Approach Classes _ Accuracy (%) 
Melgani [5] Morphology + PCA + SVM 6 91.67 
Li et.al [46] PCA + k-ICA + SVM o 97.78 
Osowski [6] Higher order statistics + Hermite + SVM 13 95.91 
Raj et.al [13] Wavelet + BPNN 8 97.40 
Rodriguez [47] Morphology + Decision 16 96.13 
Martis et. al [48] PCA +LS-SVM 5 93.48 
Raj & Ray [29] DCT based DOST + SVM-PSO 16 98.82 
Proposed Chaotic + Statistical metrics + RR + SVM 16 98.95 





4. CONCLUSION 

The major hurdles tackled in this paper includes the selection of appropriate R-peak detection 
algorithm, a novel application of statistical features such as kurtosis, skewness and standard deviation of R-R 
window (first time used in ECG data mining for abnormality detection) and chaotic metrics. This paper has 
presented an automated ECG signal analysis scheme for long-term monitoring and analyzing the 
nonstationary behavior of the ECG signals. A new statistical and chaotic based feature extraction 
methodology is proposed to produce features representing the dynamic variations in the ECG morphology. 
Finally, 13 features obtained by combining the statistical,chaotic features and R-R interval features are 
utilized for the prediction of 16 ECG signal classes using the SVM classifier. It is to be noted that no 
optimization methods have been used to fine tune the classifier performance. The proposed method profits an 
improved accuracy of 98.95% on the benchmark MIT-BIH arrhythmia database. This research work has a 
scope extending further to incorporate the classification with still lesser number of features for arrhythmia 
analysis. The time consumption analysis has not been considered in this paper. However, in real time ECG 
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processing, parallel processing schemes, FPGA and ASIC implementation would reduce the overall 
computation time in both online and offline modes. 
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